% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/doaggregate.R
\name{doaggregate}
\alias{doaggregate}
\title{Summarize environmental and demographic indicators at each location and overall}
\usage{
doaggregate(
  sites2blocks,
  sites2states_or_latlon = NA,
  radius = NULL,
  countcols = NULL,
  wtdmeancols = NULL,
  calculatedcols = NULL,
  subgroups_type = "nh",
  include_ejindexes = FALSE,
  calculate_ratios = TRUE,
  extra_demog = TRUE,
  need_proximityscore = FALSE,
  infer_sitepoints = FALSE,
  called_by_ejamit = FALSE,
  updateProgress = NULL,
  silentinteractive = TRUE,
  testing = FALSE,
  showdrinkingwater = TRUE,
  showpctowned = TRUE,
  ...
)
}
\arguments{
\item{sites2blocks}{data.table of distances in miles between all sites (facilities) and
nearby Census block internal points, with columns ejam_uniq_id, blockid, distance,
created by getblocksnearby  function.
See \link{testoutput_getblocksnearby_10pts_1miles} dataset in package, as input to this function}

\item{sites2states_or_latlon}{data.table or just data.frame,
with columns ejam_uniq_id (each unique one in sites2blocks) and
ST (2-character State abbreviation) or lat and lon}

\item{radius}{Optional radius in miles to limit analysis to. By default this function uses
all the distances that were provided in the output of getblocksnearby(),
and reports radius estimated as rounded max of distance values in inputs to doaggregate.
But there may be cases where you want to run getblocksnearby() once for 10 miles, say,
on a very long list of sites (1,000 or more, say), and then get summary results for
1, 3, 5, and 10 miles without having to redo the getblocksnearby() part for each radius.
This lets you just run getblocksnearby() once for the largest radius, and then query those
results to get doaggregate() to summarize at any distance that is less than or equal to the
original radius analyzed by getblocksnearby().}

\item{countcols}{character vector of names of variables  to aggregate within a buffer
using a sum of counts, like, for example, the number of people for whom a
poverty ratio is known, the count of which is the exact denominator needed
to correctly calculate percent low income.}

\item{wtdmeancols}{character vector of names of variables to aggregate within a buffer
using a population weighted mean or other type of weighted mean.}

\item{calculatedcols}{character vector of names of variables to aggregate within a buffer
using formulas that have to be specified.}

\item{subgroups_type}{Optional (uses default). Set this to
"nh" for non-hispanic race subgroups as in Non-Hispanic White Alone, nhwa and others in names_d_subgroups_nh;
"alone" for EJScreen v2.2 style race subgroups as in    White Alone, wa and others in names_d_subgroups_alone;
"both" for both versions. Possibly another option is "original" or "default" but work in progress.}

\item{include_ejindexes}{whether to calculate EJ Indexes and return that information}

\item{calculate_ratios}{whether to calculate and return ratio of each indicator to its US and State overall mean}

\item{extra_demog}{if should include more indicators from EJScreen v2.2 report,
on language, more age groups, gender, percent with disability, poverty, etc.}

\item{need_proximityscore}{whether to calculate proximity scores}

\item{infer_sitepoints}{set to TRUE to try to infer the lat,lon of each site around which the blocks in sites2blocks were found.
lat,lon of each site will be approximated as average of nearby blocks, although a more accurate slower way would
be to use reported distance of each of 3 of the furthest block points and triangulate}

\item{called_by_ejamit}{Set to TRUE by ejamit() to suppress some outputs even if ejamit(silentinteractive=F)}

\item{updateProgress}{progress bar function used for shiny app}

\item{silentinteractive}{Set to TRUE to see results in RStudio console.
Set to FALSE to prevent long output showing in console in RStudio when in interactive mode}

\item{testing}{used while testing this function}

\item{...}{more to pass to another function? Not used currently.}
}
\value{
list with named elements:
\itemize{
\item \strong{\code{results_overall}}   one row data.table, like results_bysite, but just one row with
aggregated results for all unique residents.
\item \strong{\code{results_bysite}}   results for individual sites (buffers) - a data.table of results,
one row per ejam_uniq_id, one column per indicator
\item \strong{results_bybg_people}  results for each block group, to allow for showing the distribution of each
indicator across everyone within each demographic group.
\item \strong{longnames}  descriptive long names for the indicators in the above outputs
\item \strong{count_of_blocks_near_multiple_sites}  additional detail
}
}
\description{
Used by ejamit() and the shiny app to summarize blockgroups scores at each site and overall.
}
\details{
\code{\link[=getblocksnearby]{getblocksnearby()}} and doaggregate() are the two key functions that run \code{\link[=ejamit]{ejamit()}}.
\code{doaggregate()} takes a set of sites like facilities and the
set of blocks that are near each,
combines those with indicator scores for block groups, and
aggregates the numbes within each place and across all overall.

For all examples, see \code{\link[=getblocksnearbyviaQuadTree]{getblocksnearbyviaQuadTree()}}

\code{doaggregate()} is the code run after \code{\link[=getblocksnearby]{getblocksnearby()}} (or a related function for
polygons or FIPS Census units) has identified which blocks are nearby.

\code{doaggregate()} aggregates the blockgroup scores to create a summary of each indicator,
as a raw score and US percentile and State percentile,
in each buffer (i.e., near each facility):
\itemize{
\item \strong{SUMS OF COUNTS}: for population count, or number of households or Hispanics, etc.
\item \strong{POPULATION-WEIGHTED MEANS}: for  Environmental indicators, but also any percentage indicator
for which the universe (denominator) is population count (rather than households, persons age 25up, etc.)

\emph{\strong{EJ Indexes}:} The pop wtd mean of EJ Index raw scores.
\item \strong{CALCULATED BY FORMULA}: Buffer or overall score calculated as weighted mean of percentages, where the weights are
the correct denominator like count of those for whom the poverty ratio is known.
\item \strong{LOOKED UP}: Aggregated scores are converted into percentile terms via lookup tables (US or State version).
}

This function requires the following datasets:
\itemize{
\item \link{blockwts}: data.table with these columns: blockid , bgid, blockwt
\item \link{quaddata} data.table used to create localtree, a quad tree index of block points
(and localtree that is created when package is loaded)
\item \link{blockgroupstats} - A data.table (such as EJScreen demographic and environmental data by blockgroup)
}
}
\section{\strong{Identification of nearby residents -- methodology:}}{
EJAM uses the same approach as EJScreen does to identify the count and demographics of nearby residents,
so EJScreen technical documentation should be consulted on the approach,
at \href{https://www.epa.gov/ejscreen/technical-information-about-ejscreen}{EJScreen Technical Info}{.uri target="_blank" rel="noreferrer noopener"}.
EJAM implements that approach using faster code and data formats, but it
still uses the same high-resolution approach as described in EJScreen documentation
and summarized below.

The identification of nearby residents is currently done in a way that includes all 2020 Census blocks whose
"internal point" (a lat/lon provided by the Census Bureau) is within the specified distance of the facility point.
This is taken from the EJScreen block weights file, but can also be independently calculated.

The summary or aggregation or "rollup" within the buffer is done by calculating the
population-weighted average block group score among all the people residing in the buffer.
The weighting is by population count for variables that are fractions of population,
but other denominators and weights (e.g., households count) are used as appropriate,
as explained in EJScreen technical documentation on the formulas, and
replicated by formulas used in EJAM functions such as doaggregate().

Since the blockgroup population counts are from American Community Survey (ACS) estimates,
but the block population counts are from a decennial census, the totals for a blockgroup differ.
The amount each partial blockgroup contributes to the buffer's overall score is based on
the estimated number of residents from that blockgroup who are in the buffer.
This is based on the fraction of the blockgroup population that is estimated to be in the buffer,
and that fraction is calculated as the fraction of the blockgroup's decennial census block population
that is in the census blocks inside the buffer.

A given block is considered entirely inside or entirely outside the buffer,
and those are used to more accurately estimate what fraction of a given block group's
population is inside the buffer. This is more accurate and faster than areal apportionment of block groups.
Census blocks are generally so small relative to typical buffers that this is very accurate -
it is least accurate if a very small buffer distance is specified
in an extremely low density rural area where a block can be geographically large.
Although it is rarely if ever a significant issue (for reasonable, useful buffer sizes),
an even more accurate approach in those cases might be either areal apportionment of blocks,
which is very slow and assumes residents are evenly spread out across the full block's area,
or else an approach that uses higher resolution estimates of residential locations than even
the Decennial census blocks can provide, such as a dasymetric map approach.
}

\seealso{
\link{ejamit}   \code{\link[=getblocksnearby]{getblocksnearby()}}
}
